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1  Introduction  and  Summary 

The  proportioned  hazards  model  of  Cox  (19721  specifies  that  the  hazard  rate  for  an 
individual  with  covariate  vector  x  is 

\(t |x)  =  \(t)  exp(3'0x)  (1.1) 

where  3q  is  a  vector  of  unknown  regression  coefficients  and  A,  the  underlying  baseline 
hazard  rate,  is  an  unknown  and  unspecified  nonnegative  function.  In  most  cases 
where  this  model  is  used,  interest  centers  on  the  estimation  of  S0.  However,  it  is 
often  also  very  useful  to  investigate  the  effect  of  the  covariates  on  the  median  (or 
some  other  fixed  quantile)  of  the  survival  time.  Let  £p(x)  be  the  pL^  quantile  of  the 
distribution  of  the  lifelength  of  an  individual  with  covariate  vector  x. 

A  first  attempt  to  estimate  £p(x)  proceeds  as  follows.  Let  A(f|x)  and  5(f|x)  be  the 
cumulative  hazard  iuuction  and  the  survival  function,  respectively,  associated  with 
A(t|x).  and  let  A(<)  be  the  cumulative  hazard  function  associated  with  A.  We  then 
have 

A(<|x)  =  A (t)  exp(/?ox)*  (1-2) 

Using  the  continuity  implied  by  (1.1)  we  may  write 

S(t\x)  =  exp(-A(t)exp(££x)),  (U3) 

and  solving  the  equation  5(f  |x)  =  1  —  p  we  obtain 

£p(x)  =  A-1([—  log(l  -  p)}  exp(-^'x)).  (1.4) 

The  natural  estimate  for  the  right  side  of  (1.4)  is 

£p(x)  =  A-1([—  log(l  -p)jexp(-d'x)),  (1.5) 

where  3  is  Cox's  (1972)  maximum  partial  likelihood  estimate  of  30  and  A  is  the 
usual  “Nelson-Aalen"  type  estimator  of  A  (see  equation  (2.4)  of  Section  2).  (In  ( 1.4). 
(1.5).  and  throughout  the  paper,  for  an  arbitrary  increasing  function  f.  f~l  denotes 
the  right  continuous  inverse  of  /  defined  by  /_1(<)  =  sup{s  :  f(s)  <  <}).  See  Cox 
and  Oakes  (19S4.  pp.  10S),  Miller  and  Halpem  (19S3).  and  Dabrowska  and  Doksum 
(19S7).  See  also  Tsiatis  (19S1)  for  the  related  problem  of  estimating  S(fjx). 

A  second  approach  is  to  note  that  for  an  arbitrary  cumulative  hazard  function 
H ,  the  survival  function  co^esoonding  to  H  is  the  product  integral 

sM = no  -  w*)),  (i.6) 

s<t 

(see  Gill  and  Johansen  (  19S9)  or  Kalbfleisch  and  Prentice  (19S0.  sec.  1.2.3))  and  that 
the  pr^  quantile  of  1  —  5  is  (1  —  S)-1(p).  For  the  case  of  the  hazard  function  A(f  jx) 
given  by  (1.2),  this  gives 

iP(x )  =  supjs  :  1  -  —  A(du')  exp(3'0x))  <  p\-  (1.7) 

li  <  5 
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Substituting  A  for  A  and  0  for  3q,  we  obtain 

«fp(x)  =  sup ^ s  :  1  -  IK1  “  A(du)exp(/?'z))  <  p}.  (1.8) 

u<* 

The  estimate  (l.S)  has  sounder  theoretical  basis  than  does  (1.5).  Let  7i  be  the  space 
of  all  cumulative  hazard  functions  and  71  be  the  real  line.  The  function  mapping  an 
arbitrary  point  (A,  3)  £  7i  x  1Z  into  £p(x)  is  (1-7)  not  (1.4),  since  (1.4)  is  valid  only 
if  A  is  continuous. 

Before  settling  on  £p(x)  there  is  a  slightly  subtle  point  that  needs  to  be  raised. 
The  estimator  £p(x)  was  obtained  under  the  assumption  that  the  Cox  model  is  given 
by  (1.2).  However,  there  is  another  way  to  specify  the  model,  namely 

S(t\x)  =  S(iY'v{(3°x),  (1.9) 


where  S(t)  is  a  baseline  survival  function.  In  general,  the  models  (1.2)  and  (1.9) 
are  different,  although  they  agree  if  S  (or  A)  is  continuous.  Because  the  estimates 
of  A  and  of  0o  give  rise  to  discrete  distributions,  it  is  important  to  decide  on  the 
appropriate  specification  of  the  Cox  model  in  general.  We  note  that  if  continuous 
survival  times  7\ , . . . ,  Tn  arising  from  the  Cox  model  are  recorded  in  a  discrete  way 
(e.g.  the  survival  times  are  recorded  in  days),  then  the  resultant  survival  times  do 
not  follow  (1.2)  but  do  follow  (1.9)  (see  Kalbfleisch  and  Prentice  (19S0,  sec.  4.6)). 
So  we  take  (1.9)  as  the  specification  of  the  Cox  model  in  general.  An  elementary 
calculation  yields  that  in  this  setup,  the  cumulative  hazard  for  an  individual  with 
covariate  x  satisfies 

1  -  A(dt\x)  =  (1  -  A (df))expW  (1.10) 

where  A  corresponds  to  the  S  of  (1.9)  via  (1.6),  or  equivalently 


w-~L 


dS(s) 


The  estimate 


[O'*]  S(s-) 

ip(x)  =  sup{s  :  1  -  nu  -  A (<*«))“***>  <  pj 


(1-11) 


(1.12) 


li<  9 


is  obtained  by  substituting  A  for  A  and  0  for  0O  (other  estimates  of  A  can  be  used; 
see  for  example  Kalbfleisch  and  Prentice  (19S0,  sec.  4.3).  Our  Monte  Carlo  studies 
reported  in  Section  4  show  that  in  terms  of  mean  squared  error,  £p(x)  is  slightly 
better  than  £p(x)  and  that  each  of  these  noticeably  outperforms  fp(x).  The  three 
estimates  &*).  Cp(*)s  and  £p(x)  ail  have  the  same  first  order  asymptotics. 

Dabrowska  and  Doksum  (19S7)  considered  estimation  of  £p(z).  They  studied  the 
estimator  ifp(x)  (more  precisely,  the  estimate  (1.5)  with  Breslow's  (1972,  1974)  esti¬ 
mate  of  A  instead  of  M  obtained  tb*»  asymptntir  dic+ribuLicti  of  v»'(Cp(-T)  —  Cp(x )) 
as  a  process  in  p,  from  which  they  constructed  asymptotic  confidence  intervals  for 
tP(z).  They  also  provide  information  on  the  efficiency  of  the  semiparametric  estima¬ 
tor  £r( i)  vs.  the  optimal  estimator  in  certain  parametric  models. 
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In  this  paper  we  expand  on  the  work  of  Dabrowska  and  Doksum,  and  develop  con¬ 
fidence  bands  for  the  function  £p(x)  as  x  varies.  We  propose  two  types  of  confidence 
bands  (more  properly,  confidence  Gets).  One  type  we  refer  to  as  “simulated-process 
bands";  the  other  is  based  on  the  bootstrap.  The  development  of  the  simulated- 
process  bands  is  based  on  weak  convergence  results  of  the  following  type.  Let  p  be 
fixed  and  let  x  vary  over  a  set  K.  Then,  as  n  — >  oc, 

y/n(lv(x)  -  £p(x))  — *  Vp(x)  in  distribution  (1.13) 

where  Vp  is  a  Gaussian  process  defined  on  I\.  Since  x  is  multidimensional,  this  result 
involves  weak  convergence  of  multiparameter  stochastic  processes.  (A  more  precise 
version  of  this  result  appears  as  Part  (i)  of  Corollary  2.1  below).  To  use  a  result  such 
as  (1.13)  to  construct  confidence  bands  for  £ p(x ),  we  need  to  obtain  the  distribution 
of  supx6A-  |Vp(x)|.  This  is  in  general  quite  impossible,  as  is  explained  in  Section  2. 

One  strategy  is  to  give  the  process  Vp  a  representation  that  makes  possible  a 
simple  way  to  simulate  the  distribution  of  supr6^-  |lp(x)|  on  a  computer.  (In  fact, 
the  covariance  structure  of  Vp  depends  on  unknown  parameters.  The  representation 
makes  it  possible  to  simulate  the  distribution  of  suprg^  jVp(x)|,  where  the  covariance 
structure  of  Vp  is  a  consistent  estimate  of  the  covariance  structure  of  Vp.)  We  simulate 
many  copies  of  supIgK-  J V^(x)]  and  use  them  to  obtain  the  required  critical  constants. 

The  construction  of  the  bootstrap  bands  follows  Hjort  (1985a)  and  is  straightfor¬ 
ward.  Below  we  give  a  sketch  of  it  in  the  case  of  no  censoring;  a  detailed  description 
is  given  in  Section  2.3.  The  Cox  model  (1.9)  has  two  unknown  parameters,  5  and  3o- 
We  estimate  those  by  5  and  /?,  where  S  is  defined  by  5(f)  =  I7a<t(l  —  A (ds)).  For 
individual  i,  we  generate  an  artificial  lifelength  from  the  estimated  model  (1.9),  i.e. 
generate  a  lifelength  from  the  survival  function  (5(f))exp^'I,b  From  these  artificial 
lifelengths  we  calculate  the  function  £'(x),  x  6  I\  and  form  v/”(£p(x)  —  fp(x)),  x  £  I\. 
The  bootstrap  principle  is  that  if  (S,/3)  is  close  to  (5,  /?o),  then  the  distribution  of 
suPx6A'  \/^]£P(x)  —  £p(x)I  is  close  to  that  of  sup_gK  v/rc|£p(x)  —  £p(x)|-  Thus,  the  re¬ 
quired  critical  constants  are  obtained  from  the  large  number  of  copies  of 

suPxeA- V”l^(x) 

Actually,  we  discuss  several  kinds  of  simulated-process  bands  and  compare  these 
and  the  bootstrap  bands  in  terms  of  width  and  coverage  probability  in  Monte  Carlo 
studies.  We  provide  theorems  which  state  that  both  the  simulated-process  type  bands 
and  the  bootstrap  bands  have  asymptotically  correct  coverage  probabilities. 

This  paper  is  organized  as  follows.  Section  2  gives  statements  of  the  mam  theo¬ 
retical  results  of  the  paper,  and  describes  in  detail  our  approach  to  constructing  the 
simulated-process  and  the  bootstrap  bands,  and  “equal-precision”  versions  thereof 
analogous  to  the  equal-precision  bands  developed  by  Nair  (19S4)  and  Hjort  (19S5b) 
in  different  contexts.  Section  3  gives  technical  details  on  how  the  bands  were  formed 
and  sdro  details  on  how  the  computafons  were  caiiicd  out.  Section  4  reports  Monte 
Carlo  studies  that  compare  all  the  bands.  Perhaps  surprisingly,  these  studies  indi¬ 
cate  that  the  simulated-process  bands  perform  very  well  both  in  terms  of  width  of  the 
bands  and  in  terms  of  coverage  probability,  even  for  moderate  sample  sizes.  Section  4 
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also  reports  Monte  Carlo  studies  that  compare  the  three  estimators  sCp(-).  sfP(  x)  and 
|p(x).  Section  5  illustrates  the  various  bands  on  the  Stanford  Heart  Transplant  Data. 
The  Appendix  provides  proofs  of  the  results  stated  in  Section  2. 

There  are  numerous  examples  of  situations  where  one  can  establisn  weak  conver¬ 
gence  to  a  process  whose  distribution  is  intractable,  and  also  depends  on  unknown 
parameters.  In  many  cases  one  can,  by  a  suitable  transformation,  reformulate  the 
weak  convergence  result  in  such  a  way  that  the  distribution  of  the  limiting  process  is 
more  tractable,  and  does  not  depend  on  unknown  parameters.  For  example  Efron’s 
(1967)  and  Hall  and  Wellners  (19S0)  transformation  of  the  Kaplan-Meier  curve  yield 
weak  convergence  to  a  Brownian  Motion  and  a  Brownian  Bridge,  respectively,  and 
Nairs  (19S4)  and  Hjort’s  (19S5b)  rescaling  transformations  of  the  Kaplan-Meier  curve 
and  the  Nelson-Aalen  estimator,  respectively,  result  in  a  weak  convergence  to  a  time 
transformed  Omstein-Uhlenbeck  process.  In  more  complicated  situations  (e.g.  the 
Kaplan-Meier  quantile  process  or  the  process  S(-)  in  the  Cox  model)  such  transfor¬ 
mations  are  not  available.  We  hope  that  the  technique  of  using  the  simulated  process 
will  prove  useful  in  these  situations. 


2  Notation  and  Theoretical  Results 

2.1  Notation  and  Basic  Results 


We  consider  a  study  involving  n  individuals  where  for  individual  i,  A”,  is  a  g-dimensional 
vector  of  covariates,  T'  is  a  lifetime,  and  C,'  is  a  censoring  time.  We  observe  (7),  £,,  A",), 
where  T,  =  min(K,  C,)  and  <5,-  =  I{Yt  <  Ct),  JY.4)  being  the  indicator  function  of  the 
set  .4.  Define  the  counting  processes 


and  the  processes 


N%(t)  =  J(T,  <  t:  6,  =  1)  t>0 

J,(t)  =  I{T,  >t)  t>  0. 


(2.1) 

(2.2) 


In  this  notation,  conditional  on  A',  =  x,-,  i  =  1, - n.  the  partial  likelihood  of  30  at 

time  r  is 


L(3t')-  TT  TT  ( 

„e[0.r]i:i  V&i  7J(u)exp(/?'xt) 


d  A’,(u) 


(2.3) 


The  maximum  partial  likelihood  estimator  of  30  at  time  r  is  the  value  3  =  3(r)  of  3 
that  maximizes  L(3,r).  In  practice,  of  course,  one  uses  the  value  of  3  that  maximizes 
the  partial  likelihood  at  time  oc:  see  the  discussion  in  Section  4  of  Andersen  and  Gill 
i'19S2)  (henceforth  AG).  The  ‘'Nelson- Aalen5’  estimator  of  A  is 


A(f) 


Jo  j=i  .=1 


(2.4) 


The  estimator  A (t)  increases  only  by  jumps,  which  occur  at  the  uncensored  deaths. 
If  we  linearly  interpolate  A (t)  we  obtain  Ac(f),  Breslow's  (1972.1974)  estimator  of  A. 
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We  remark  that  if  30  is  known  to  be  0,  i.e.  it  is  known  that  there  is  no_ covariate 
effect,  and  if  there  is  no  censoring,  then  the  distribution  corresponding  to  A (#)  is  the 
usual  empirical  distribution  function.  This  is  not  the  case  for  A'  '(<)• 

The  notation  below  follows  closely  that  of  AG,  on  whose  results  we  rely  heavily. 
For  a  5-vector  w  =  (uq, . . . ,  tw,),  w denotes  the  5  x  q  matrix  whose  (i,j)ih  entry 
is  wxu'j.  Define 


S{0)(3,t) 


s(2,(/M) 

E(M 


n  /S 

=  -^x,Ji(t)exp(/Tx,), 

n  1=1 

=  -  ^xP2J,(f)exp(/?'x,), 
n  1=1 


(2.5) 


and 


V'OM) 


sl2,(f),t) 

S<°>(0,t) 


(E(0,t))®7. 


We  assume  that  the  vector  (Ari, . . . ,  JV„)  forms  a  counting  process  with  respect  to 
some  nitration  (JFt,  i  >  0).  (This  is  the  case  for  example  if  the  Yt's  are  independent, 
and  at  time  t  it  can  be  determined  whether  or  not  censoring  has  taken  place,  i.e. 
censoring  is  predictable;  this  does  not  require  the  C,-’s  to  be  independent).  A  good 
heuristic  treatment  of  the  concepts  involved  here  is  Gill  (1984). 

Fix  p(1)  €  (0, 1)  and  let  K  be  a  bounded  rectangle  in  Hq .  Define 


r  =  sup {£p(x);p  €  [0,p(1)],x  €  A'}. 


(2.6) 


Assume  that  for  some  e  >  0,  A  is  continuous  and  positive  on  [0.  r  -f  e]  and  assume 
Conditions  A-D  of  AG(1982)  with  the  interval  [0,  r  +  e]  in  place  of  [0.  l].  Condition  B 
states  that  there  exist  scalar,  vector,  and  matrix  functions  and  s(:),  respec¬ 

tively,  of  3  and  t.  to  which  S(CA  S(1),  and  converge  uniformly  over  a  neighborhood 
of  3o  and  over  [0,  t  -f  ej.  Let 


e(M 


sM(3,t) 


and 


sW(.3,t) 


(e(/U))®2- 


(2.7) 


Then  e  is  a  5-vector  and  v  is  a  matrix.  Define  S  =  B(r)  by 


S=  r  v(Xi)^(3o,t)dMt) 

Jo 

and  assume  that  D  is  positive  definite.  Also  define 


(2.S) 


.Vote  that  b(t)  is  a  vector.  For  R  a  bounded  dosed  rectangle  in  7Z m.  Crrx(R)  wiil 
denote  the  space  of  real  valued  continuous  functions  denned  on  R  with  the  sup  norm 
t  opology. 

Theorems  1,  2,  and  3  below  concern,  essentially,  the  limiting  distributions  of  the 
processes  Vn(£p(x)  -  £p(x)),  y/n(£p(x)  -  £p(x)),  and  ./n(iP(x)  -  £p(x))  and  of  some 
of  their  relatives  as  (p,  x)  ranges  over  [0,p^]  x  K.  These  involve  the  space  Cq+l  = 
C7+i([0,p(1)]  x  I\).  Corollary  2.1  concerns  the  limiting  distribution  of  these  processes 
when  p  is  fixed  and  x  is  scalar  and  involves  the  familiar  space  C  =  C(A').  For 
technical  reasons,  we  find  it  necessary  to  work  with  Cm(A)  instead  of  the  “Skorohod 
space”  Dm(R)  (see  the  Appendix). 

Let  (S(-|x))  be  the  function  obtained  by  linearly  interpolating  S(-|x),  and  let 

|p(x)  be  obtained  by  solving  for  t  in  the  equation  ^5(f|x))  =  1  —  p.  This  £p(x) 
is  continuous.  Details  on  the  exact  definition  of  <fp(x)  appear  in  Section  4.  Here, 
we  simply  note  that  all  the  estimators  are  asymptotically  equivalent  uniformly  in 
p  €  [0,p^],x  6  A';  this  is  made  clear  in  the  proof  of  Theorem  1.  We  shall  use  <fp(x) 
to  refer  to  either  £p(x)  or  (p(x)  except  in  the  exact  statements  of  the  theorems  and 
in  the  proofs  in  the  Appendix. 

In  the  results  below,  W(~)  denotes  a  standard  Brownian  Motion  on  [O.oo)  and 
Z  a  <?- dimensional  random  vector  that  is  normally  distributed  with  mean  0  and 
identity  as  covariance  matrix,  and  is  independent  of  IV(-).  We  use  the  notation 
-  =  log(l  —  p);  also,  denotes  convergence  in  distribution,  and  denotes 
convergence  in  probability. 

Theorem  1  Under  the  assumptions  stated  above,  as  n  — ►  oo 

Vn(£p(x)  -  £p(x))  V(p,x) 
in  Cq+ 1  (j0.p(1)l  x  I\) ,  where 

v(p'x>  =  mu  - + ( - mm - jz- 


Theorem  1  indicates  that  the  variance  of  V(p.x)  is 
2.  ,  a(Zt>(x))  ,  fbttrix))  +  «xexp(-Joi)>'r,-i  fb{£p(x))  -  ~x exp(-^x)> 

,(p’I)  =  (W?r( - TTT) - ( - TFT - ) 

(2.10) 

From  (2.10)  we  see  that  to  estimate  crip,  x)  we  need  to  estimate  the  functions  a(-),  b(-) 
and  the  matrix  E:  in  addition,  we  shall  need  to  estimate  A(-).  For  ai •).&{•)  ana  E. 
we  introduce  the  natural  estimates 


ait) 

b(t) 
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(2.11) 


and 


Z  =  f  V(P,t)S^(3,t)dA(t). 

Jo 

The  estimation  of  A  is  somewhat  more  difficult.  Possibilities  include  methods  based 
on  spiines  (Whittemore  and  Keller,  19S6)  and  those  based  on  kernel  smoothers.  Ker¬ 
nel  smoothers  are  computationally  convenient,  and  in  addition  their  asymptotic  prop¬ 
erties  in  the  present  context  have  already  been  studied  by  Ramlau-Hansen  (19S3). 
To  describe  them,  let  R  be  a  function  of  bounded  variation  with  support  on  [  —  1, 1], 
and  whose  integral  is  1,  and  let  { bn }  be  a  sequence  of  positive  constants  such  that  as 
n  — f  oo,  we  have  bn  — ►  0  and  nb7n  — *  oo.  Define  the  kernel  estimate  of  A(-)  by 

(2-i2) 

The  specific  choices  of  R  and  {£n}  are  discussed,  in  the  context  of  ordinary  density 
estimation,  in  Silverman  (19S6.  pp.  40-72).  See  our  discussion  in  Section  4.  Having 
specified  a  choice  of  R  and  {6n}  we  may  define  an  estimate  a(p,x)  of  a(p,x)  by 
substituting  in  (2.10)  a(-),  b(-),  E,  £p(x),  /?,  and  A(-)  for  a(-),  £>(•),  E.  £p(x),  3,  and  A(-), 
respectively.  The  following  lemma  gives  uniform  consistency  of  this  estimator. 

Lemma  2.1  Under  the  conditions  stated  above,  a  n  — *  oo 

(0  sup0<j<T+{  \a(t)  -  a(i)|  0 

(*0  suPo<t<T+4  I Kt)  -  6(i)|  0 

(m)  E  E 

(iv)  suppe[0ip(1)]tXeA.  ||p(x)  -  <fp(x)|  0 

(v)  for  any  tj  >  0, 

suPp<(<T  \\(t)  -  A(t)|  0 

(vi)  for  any  pP'1  >  0, 

supp6[p(o))P(1)],x€A'  I °(p,x)  ~  <r(p,x) I  0 

(vii)  Let  ac  and  bc  be  given  by  (2.11),  except  il:i  A  is  replaced  by  the  version  of 
A  obtained  by  linearly  interpolating  A.  Then  me  conclusion  of  Pari  (vi)  is  still 
true  for  the  version  ac(p,x)  of  b(p.x)  obtained  by  using  ac.bc  and  £p(x)  instead 
of  a ,  b  and  £p(x). 


Suppose  that  we  were  able  to  obtain  ca ,  the  (1  —  a)th  quantile  of  the  distribution  of 
suPP€^'|»yi)],*eA'  |V'(p,  x)j.  We  would  then  have 


lim  P 


U*)-^<Ux)<Ux) 


for  all  p  £  [p(0),pP)l,x  £  K 
y/n 


1  —  Q. 


i.e.  a  i'l  —  q)  x  100%  asymptotic  confidence  band  for  £p(x)  is  £p(x)  ±  ca/'v/n.  This 
Kolmogorov- Smirnov  type  of  confidence  band  suffers  the  defect  that  it  has  constant 


width.  One  would  want  the  band  to  be  narrower  at  those  values  of  (p.  xj  where  the 
variance  of  £^(x)  sma-h.  the  context  of  forming  confidence  bands  for  a  survival 
function  in  the  random  censorship  model  of  survival  analysis.  Nair  (19S4)  proposed 
a  confidence  band  with  the  property  that  the  width  of  the  band  at  a  given  point  is 
proportional  to  the  estimated  standard  deviation  at  that  point.  He  called  the  band 
'"equal-precision  band”.  A  similar  idea  was  proposed  by  Hjort  ( 19S5b)  for  estimation 
of  cumulative  hazard  rates.  We  proceed  along  a  similar  route. 

Theorem  2  Under  the  regularity  conditions  stated  above .  as  n  —*  oo 

/^p(t)  ~  d  '  V(P,X) 

'  crc(p,x)  •  cr(p,x) 

in  CqJrl  ([p(°Kp(1)]  X  A'). 


2.2  The  Simulated  Process  Bands 


Our  procedure  for  using  Theorem  2  to  construct  confidence  bands  amounts  to  esti¬ 
mating  the  parameters  of  the  process 


L(p,x)  = 


v(p,z) 

a(p,x) 


and  then  generating  a  process  L(p,  x)  with  those  estimated  parameters.  To  describe 
it  in  more  detail,  note  that  tve  may  write 

L(p,x)  =  c(p,  x)W*(a(£p(x)))  +d(p,x)Z  (2.13) 


vnere 


p,x  ,  =  a(£P(x))  +  (i>(£p(x))  +  7Txexp(-.i?ox))  S  1  (KCp(x))  -r  "x  exp(- 3'0x 


d(p-.  ~  '  =  (&(£*(*))  4-  -xexp(-/?ox))  S  1/2 

[a(£P(x))  +  (&(£p(x))  +  ~xexp(-.j?ox))  T_1  (b(«fp(x))  +  nxexp(-3'0x 


(2.14) 


Let  c'p.x)  and  d(p,x)  be  the  estimates  of  c(p,  x)  and  <f(p,  x)  obtained  by  replacing 
all  the  unknowns  by  their  continuous  estimates.  We  generate  the  process 

i(p,x)  =  c(p,x)U'(a'({'(x)))  +d(p.x)Z  (2.15) 

where  U’(-)  is  a  standard  Brownian  motion,  Z  is  an  independent  g-variate  standard 
normal  variable  and  (I V(-).Z)  is  independent  of  all  parameter  estimates,  and  cal¬ 
culate  M  =  supp^oj.pUjj.j.g/v  |L(p,  x  )|.  To  estimate  the  (1  -  a)th  quantile  of 
the  distribution  of  M ,  we  repeat  the  above  step  independently  n0  times,  for  some 


S 


large  number  n0.  obtaining  iid  copies  M . Mno  of  M.  and  take  the  empirical 

(1  —  a  f'h  quantile  of  . Mno  as  an  approximation  to  Tus  produces  the 

band 


c 

sp 


(x)  ±5 


(") 


cr(p,z)/Vn. 


(2.16) 


Theorem  3 


Under  the  regularity  conditions  stated  above,  as  n  — <•  oo 


(Zr(z)  ~  fp(z))| 


a(p,x) 


<  s ^  for  all  p  6  [p^,  p^],  x  £  I\ 


1  —  Q 


Thus  the  band  (2.16)  has  asymptotic  coverage  probability  1  —  a. 

2.3  The  Bootstrap  Bands 

Suppose  that  the  censoring  variables  can  be  thought  of  as  being  iid  from  some  survival 
function  Rc ,  and  also  independ'  at  of  the  survival  Limes.  If  we  view  the  A','s  as  fixed 
at  x{,  the  Cox  model  is  then  specified  by  the  triple  (S,  ^0,  Rc)-  Let  Rc  be  the  Kaplan- 
Meier  estimate  of  Rc-  A  natural  way  to  resample  (see  Hjort  (1985a))  is  to  generate 
artificial  data  as  follows. 

For  each  i  =  1, ....  n 

1  Generate  Y’  ~  (S(-))exp^r^ 

2  Generate  C*  ~  Rc  (ail  observations  independent) 

3  Form  Tf  =  min (Yf,  C-)  and  <5*  =  I{Y‘  <  C‘) 

This  gives  one  artificial  data  set  (T‘,  6‘,  x,),  i  =  1 . n  from  which  we  calculate 

A’  and  F",  and  we  use  those  to  construct  the  function  £*(x)  (and  also  £’c(r).  for  the 
technical  statement  of  Theorem  4).  This  is  repeated  independently  a  large  number 
of  times. 

Efron  and  Tibshirani  (19S6)  also  discuss  the  scheme  of  bootstrapping  by  resam¬ 
pling  from  the  triples  ( Tt.  6,,  A',),  i  =  1. ....  n. 

Suppose  that  we  wish  to  estimate  the  variability  of  some  estimate,  in  our  case 
fp{x).  The  first  method  mentioned  is  appropriate  for  estimating  the  conditional 
variance  of  £p(x)  given  the  A"s.  The  second  method  is  appropriate  for  estimating 
the  unconditional  variance  of  fp(x),  i.e.  averaging  over  the  marginal  distribution  of 
the  covariates  and  of  the  censoring  variables.  If  the  distribution  of  the  A\‘s  does  not 
depend  on  the  unknown  parameters  5  and  then  the  usual  ancillaritv  arguments 
point  to  the  conditional  variance  as  the  “right”  quantity  to  estimate.  This  is  our 
point  of  view,  and  the  results  below  pertain  to  the  first  method  of  resampling. 

This  situation  is  closely  connected  to  bootstrapping  in  linear  regression  models, 
where  one  can  bootstrap  by  resampling  from  the  pairs  (responses,  covariates),  or  one 
can  bootstrap  by  resampling  from  the  residuals:  see  Freedman  (1981).  Many  of  the 
comments  in  the  discussion  paper  Wu  (1986)  are  relevant  here. 
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To  show  that  the  bands  axe  asymptotically  valid  we  need  to  show  that  for  large  n. 
the  distribution  of  -  £p(x))  is  close  to  that  of  y/n{£p(x)  -  <fP(x )).  The  tech¬ 

nical  statement  of  this  fact  involves  the  notion  of  “weak  convergence  in  probability", 
and  is  given  as  Theorem  A.l  in  the  Appendix. 

To  construct  the  bands  based  on  the  bootstrap,  let 
U’*  =  supp6[0  p(i)]  xeA-  v/n|f*(x)  —  £p(x)|/<7(p,  x).  Obtain  a  large  number  of  copies  of 
w',  say  . . . ,  w’g  and  let  b ”  be  the  (1  —  a)tk  quantile  of  the  empirical  distribution 
°f  ^'l  >  •  •  •  i  WB- 

Theorem  4  Assume  the  conditions  of  Theorem  1.  If  sampling  is  carried  out  via 
steps  1,2,  and  3.  then  the  band 


(p{x)  ±  b^a(p,  x)/\/u 
has  asymptotic  coverage  probability  1  —  a. 

Remark:  In  our  simulation  studies,  we  have  used  the  bootstrap  estimate  of  stan¬ 
dard  error  instead  of  a(p,x).  That  is,  before  bootstrapping  to  get  the  critical  con¬ 
stants  as  described  above,  the  standard  error  function  of  the  process  is  estimated 
using  a  separate  set  of  bootstrap  samples.  This  does  not  require  the  estimation  of 
the  hazard  rate,  but  is  much  more  computationally  intensive. 

2.4  Extensions  and  Special  Cases 

There  are  several  corollaries  and  extensions  to  Theorems  1-4  that  could  be  stated. 
Examples  include  the  following. 

a  The  case  of  p  fixed  at  pQ,  and  xl5 . . . ,  Xf  fixed  while  x(+1, . . . ,  xq  vary  freely.  Weak 
convergence  then  takes  place  in  the  space  C7_;. 

b  The  case  of  p  fixed  at  p0,  and  x  varying  freely  through  a  smooth  subset  5  of  the 
rectangle  I\.  An  example  of  this  arises  if  the  model  is  a  polynomial  regression  i  i 
the  scalar  v .  i.e. 


A(iju)  =  X(t)  exp(dju  +  32u'  —  . . .  -f-  8dud). 

and  then  5  is  simply  a  line  through  I\ .  Weak  convergence  takes  place  in  the  space 
of  continuous  functions  defined  on  5.  An  example  of  this  is  the  Stanford  Heart 
Transplant  Data  (see  Section  5)  in  which  we  used  a  secjrd  degree  poivnomial. 

ror  the  sake  a  reference  we  state  (Coredary  2.1  below)  our  results  for  the  simple  but 
important  case  of  p  fixed  at  p0  and  of  scalar  x  varying  over  the  set  K  =  [I\\,  K?}. 
1  he  usual  assumptions  are  in  force  (in  particular  we  assume  that  A  is  continuous  ana 
positive  on  [0,  supA-1<r<A-2  <fPo(x)  -f  e]).  We  denote  V(p0,x)  and  a{p0,x )  by  rw(x) 
and  Cp,(x).  respectively;  $1?^(pq)  is  the  (1  —  a )th  quantile  of  the  distribution  of 
suP/\,  <r<A'j  i^fpo,  r  ,!,  and  b^\p0)  is  the  corresponding  quantity  for  the  bootstrap 
process. 
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Corollary  2.1  Fix  p  at  p0  and  let  the  scalar  x  vary  over  [A'i  ,  K?].  Then  as  n  — >  oc 
(0  Vn(^0(x)  -  U{x))/crpo{x)  -£■*  VpQ ( x ) / cTp0 ( i )  in  C[KUK?} 


( i  i )  TAe  bands 


ipo(x)  ±  •sL")(Po)^(po,x)/%/n 


and 

(Po(x)  ±  ^n)(p0)o-(po,^)/v/« 


each  have  asymptotic  coverage  probability  1  —  a. 


3  On  Computation  of  the  Bands 

In  this  section  we  begin  by  discussing  two  topics  pertaining  to  the  simulated-process 
bands — estimating  the  hazard  rate,  and  forming  another  type  of  simulated-process 
band  by  using  the  asymptotic  distribution  of  log  £p(x).  Then  we  remark  on  papers  of 
Hall  on  how  many  bootstrap  replications  are  needed  in  forming  confidence  intervals. 
Finally  we  give  information  on  the  computers  and  programs  which  were  used  in 
carrying  out  the  studies. 

The  simulated-process  equal-precision  band  is  given  in  (2.16),  (2.13),  and  (2.14), 
with  formulas  for  estimates  of  the  parameters  of  the  limiting  process  given  by  (2.11) 
and  (2.12).  The  kernel  estimate  of  the  hazard  rate,  however,  is  not  fully  defined  by 
(2.12);  the  kernel  function  and  the  bin  width  must  be  specified.  It  is  well  known 
that,  whereas  the  choice  of  kernel  function  is  not  critical,  the  choice  of  bin  width 
is.  Silverman  (1986)  emphasizes  in  particular  that  the  fixed-width  kernel  estimator 
is  defective  when  applied  to  long-tailed  distributions:  “Because  the  window  width 
is  fixed  across  the  entire  sample,  there  is  a  tendency  for  spurious  noise  to  appear 
in  the  tails  of  the  estimates;  if  the  estimates  axe  smoothed  sufficiently  to  deal  with 
this,  then  essential  detail  in  the  main  part  of  the  distribution  is  masked"  (Silverman. 
19S6.  p.  18).  In  Chapter  5  of  his  book,  he  discusses  the  adaptive  kernel  method, 
which  “is  based  on  the  common-sense  notion  that  a  natural  way  to  deal  with  long¬ 
tailed  densities  is  to  use  a  broader  kernel  in  regions  of  low  density”  (ibid.,  p.  100). 
Silverman  deals  only  with  density  estimates  based  on  iid  samples,  and  in  our  problem 
we  need  an  estimate  of  the  hazard  rate  from  non-iid  data;  however,  his  remarks  are 
pertinent  here  since  in  survival  analysis,  densities  very  often  are  long-tailed,  and 
in  addition,  as  time  goes  on,  there  is  less  accurate  information  on  the  hazard  rate 
available  from  the  data  due  to  censoring  as  well  as  earlier  deaths.  In  a  paper  primarily 
on  asymptotics  for  the  kernel  estimator  of  the  hazard  rate  in  the  random  censorship 
model  of  survival  analyst,  Ramlau-Hansen  (19S3)  illustrates  the  kernel  method  on  a 
data  set  for  which  he  chooses  in  an  ad  hoc  manner  three  intervals  of  the  time  axis  and 
different  bin  widths  for  each  interval,  such  that  the  bin  width  increases  with  time. 
In  this  paper  we  use  the  biweight  kernel 

;?<()  =  |f|<i  (3.D 

lo 
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We  have  followed  Ramiau- Hansen  in  allowing  bin  widths  to  increase  as  time  increases, 
but  we  need  an  automatic  choice  of  bin  widths  for  the  simulation  studies.  Noting 
Ramiau- Hansen's  criteria  for  uniform  consistency  of  the  kernel  estimator  of  the  haz¬ 
ard  rate,  that  6n  — * •  0  and  no*  — »  co  as  n  —  oc.  the  bin  widths  were  taken  to  be 
inversely  proportional  to  n1/>3.  Then  with  the  data  from  one  simulation,  a  subjective 
choice  of  time  intervals  and  bin  widths  was  made.  This  choice  was  used  for  all  the 
simulations.  This  particular  procedure  by  no  means  leads  to  an  optimal  choice  of  bin 
width,  but  optimality  is  not  our  goal  here — we  want  to  see  how  well  the  simulated- 
process  band  performs  with  a  merely  reasonable  choice  of  bin  width.  Another  method 
to  obtain  an  automatic  choice  of  bin  width  would  be  to  modify  the  algorithm  given 
by  Tanner  (1984),  which  is  for  estimation  of  the  hazard  rate  function  in  the  random 
censorship  model  with  no  covariates.  This  algorithm  applies  cross-validation  to  the 
iog-Iikelihood  function  to  select  three  smoothing  parameters. 

Preliminary  studies  indicated  that  a  log  transformation  greatly  improved  the  cov¬ 
erage  probability  of  the  simulated-process  band  in  many  cases.  For  the  problem  of 
forming  a  confidence  band  for  the  cumulative  hazard  function  Aft)  in  the  random 
censorship  model  without  covariates.  Bie.  Borgan.  and  Liestol  (19S4)  found  that 
bands  based  on  transformations  of  the  Nelson-Aalen  estimator  A(t)  had  much  closer 
to  nominal  coverage  probability  than  bands  based  on  the  untransformed  A(t),  in  sev¬ 
eral  Monte  Carlo  studies  with  exponential  and  Weibull  survival  times  and  exponential 
and  uniform  censoring  times.  Ivalbfieisch  and  Prentice  (1980,  p.  15)  mentioned  that 
parameter  transformations  do  not  seem  to  have  been  considered  for  survival  analysis 
inference  problems,  although  parameter  transformations  can  improve  the  adequacy 
of  normal  approximations  and  avoid  the  problem  of  impossible  values  occurring  in  a 
confidence  interval  or  band. 

In  the  present  situation,  an  argument  for  the  log  transformation  is  that  it  is  a 
variance-stabilizing  transformation  in  a  special  case.  When  the  variance  of  a  univari¬ 
ate  observation  X  is  some  function  of  its  mean,  a  variance-stabilizing  transformation 
is  a  function  g  such  that  the  variance  of  g ( X )  does  not  depend  on  the  mean.  The  'og 
transformation  is  variance-stabilizing  when  the  standard  deviation  is  proportional  to 
the  mean.  In  the  present  situation,  consider  a  single,  fixed  vaiue  of  the  covariate  x. 
Call  the  function  g  variance-stabilizing  at  x  if  the  limiting  variance  of  c(fp(x ))  does 
not  depend  on  £p(x).  For  fixed  x,  this  limiting  variance  depends  on  the  covariate, 
lifetime,  and  censoring  distributions,  as  well  as  on  the  Cox  model  regression  param¬ 
eter  3q.  To  simplify  matters,  consider  the  following  particular  situation:  a  single 
covariate  distributed  uniformly  on  (0.1),  and  exponential  distributions  for  both  the 
lifetime  and  censoring  variables.  A  formula  for  the  limiting  variance  of  fp(x)  can 
be  written  down.  Tins  formula  did  not  turn  out  to  have  a  nice  form:  it  involves 
integrals  which  must  be  evaluated  numerically.  Following  through  to  the  end  of  this 
argument,  the  calculations  were  done  for  several  specific  choices  of  the  parameters  of 
the  exponential  distributions  of  the  lifetime  and  censoring  variables,  in  order  to  plot 
the  limiting  variance  of  fp(x)  as  a  function  of  fp(xi.  In  each  case,  the  resulting  curve 
appeared  very  ciose  to  quadratic,  suggesting  the  choice  of  the  log  transformation. 
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This  argument  is  by  no  means  decisive:  What  v.-e  actually  want  is  a  normalizing 
transformation,  which  is  not  necessarily  the  same  as  a  variance-stabilizing  transfor¬ 
mation.  but  it  is  not  known  if  such  a  transformation  exists  for  this  problem.  (Efron 
( 1 9 S 2 )  discusses  the  problem  of  finding  normalizing  transformations  in  the  case  of  a 
one-parameter  family  of  distributions.)  Our  argument  merely  serves  to  suggest  that 
we  investigate  the  performance  of  the  log-transformed  bands  through  Monte  Carlo 
studies. 

A  standard  delta-method  argument  is  used,  via  Theorem  2  in  Section  2,  to 
form  the  log-transformed  equal-precision  band.  Let  g  be  a  differentiable,  monotone 
function  on  Then  Vn(s(fp(*))  -  sCW1)))  —  s'(fp(z))v'(p,x)  on 

CK,(i,  )],  and  so  the  standard  deviation  function  of 

is  g'(£p(z))cr(p,  x).  To  get  the  equal-precision  band  for  g(sp(x))5  write 

%/”($(£>(*))  -g(£p(J)))  a-  V(p,x) 
g'(Zp(x))a(p,  x)  cr(p,x ) 

on 

So  the  100(1— a)%  confidence  bandfor  g(^(x))  is  g(£p(z))±g'(ZP(x ))a(p.  x)sa/ y/n, 
where  sQ  is  the  critical  constant  for  the  equal-precision  band.  For  a  given  monotone 
function,  this  band  may  be  converted  into  a  100(1  —  a)%  band  for  £p(x).  For  the  log 
transformation  the  band  is 


sP(z)exp 


*{±77 


cr(p,x) 


lp(x) 


The  bootstrap  resampling  scheme  and  equal-precision  band  are  described  in  Sec¬ 
tion  2.3.  In  determining  number  of  bootstrap  replications,  we  noted  two  papers  by 
Hall  (19S6a.b).  In  the  situation  of  iid  vector  observations  in  which  the  parameter  of 
interest  is  a  function  of  the  mean  vector,  he  gives  expansions  for  the  probability  of 
coverage  of  one-sided  bootstrap  percentile-i  intervals.  He  then  uses  these  results  to 
show  that  in  the  situation  he  considers,  a  very  small  number  of  bootstrap  replications 
is  adequate  to  get  very  close  to  the  true  bootstrap  coverage  probability,  that  is.  the 
coverage  probability  with  an  infinite  number  of  bootstrap  replications.  He  actually 
shows  that  for  any  fixed  finite  number  of  bootstrap  samples,  the  worst  departure  of 
actual  coverage  probability  from  nominal  coverage  probability,  over  coverage  proba¬ 
bilities  for  a  €  (0.1],  is  less  than  or  equal  to  the  worst  departure  for  an  infinite  number 
of  bootstrap  replications.  He  cautions  that  his  result  does  not  mean  that  one  should 
use  a  small  number  of  replications  in  practice,  for  such  bootstrap  intervals  will  be 
wider  than  those  derived  from  a  large  number  of  bootstrap  repiications.  Hall's  theory 
is  quite  specialized:  His  situation  does  not  include  estimation  of  the  parameters  in 
the  Cox  model,  and  he  does  not  consider  confidence  bands.  Monte  Carlo  studies 
indicated,  however,  that  the  coverage  probability  of  our  bootstrap  bands  when  oniy 
'-9  bootstrap  repiications  were  used  was  usually  surprisingly  close  to  the  coverage 
probability  when  1000  bootstrap  repiications  were  used.  Except  for  these  papers,  it 
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would  not  have  occurred  to  us  that  coverage  probability  could  be  so  little  affected 
by  taking  a  very  small  number  of  bootstrap  replications.  Bootstrapping  in  the  Cox 
model  requires  a  large  amount  of  CPU  time,  and  using  a  small  number  of  bootstrap 
replications  in  the  debugging  stage  was  an  easy  way  to  save  time.  We  mention  this  in 
the  hope  that  using  only  a  small  number  of  bootstrap  samples  will  prove  useful  in  the 
early  stages  of  other  empirical  studies  of  the  bootstrap,  even  when  Hall's  conditions 
are  not  met. 

No  transformation  of  the  bootstrap  bands  is  considered  here.  Since  the  simulated- 
process  bands  require  fairly  substantial  computational  effort,  a  natural  question  was, 
“Why  not  just  bootstrap?”  A  chief  virtue  of  the  bootstrap  is  that  it  is  automatic, 
that  is,  it  requires  no  asymptotic  theory  to  derive,  nor  any  special  tricks  to  make 
it  work,  and  here  we  just  want  to  see  how  well  this  automatic,  or  untransformea. 
bootstrap  does  compared  with  the  simulated-process  method. 

Computations  were  carried  out  using  FORTRAN  programs  compiled  with  the  f77 
Unix  compiler  on  the  Florida  State  University  Statistics  Department  network  of  Sun 
computers.  The  computation  of  the  bootstrap  band  on  a  sample  of  size  80  required 
almost  2  minutes  of  CPU  time  on  a  Sparcstation  1;  computation  of  the  simulated- 
process  band  required  7  seconds.  A  study  of  bootstrap  bands  for  n=80  with  5.000 
simulations,  split  between  two  Sparcstations,  required  about  3.5  days  to  complete. 
Programs  and  subroutines  for  Cox  model  fitting  were  adapted  for  simulation  pur¬ 
poses  from  the  programs  given  in  Kalbfleisch  and  Prentice  (1980,  Appendix  3).  The 
algorithm  uses  the  usual  Peto  approximation  for  ties.  For  the  simulation  studies, 
pseudo-random  uniformly  distributed  variables  were  obtained  with  a  FORTRAN  im¬ 
plementation  of  the  “universal  random  number  generator”  of  Marsaglia  and  Zaman 
(1987).  This  random  number  generator  combines  a  lagged-Fibonacci  sequence  with 
a  simple  arithmetic  sequence.  It  has  a  period  of  about  2144  and  satisfies  stringent 
tests  of  randomness.  It  was  designed  so  that  in  all  CPU’s  with  at  least  16-bit  integer 
arithmetic,  given  the  same  four  starting  seeds,  the  same  sequence  of  uniform  vari¬ 
ables  is  produced.  Normal  and  exponential  variables  are  generated  by  the  ziggurat 
method  of  Marsaglia  and  Tsang  (19S4). 

Fortran  programs  to  calculate  all  the  bands  are  available  from  the  authors  on 
request. 


4  Simulation  Studies 

Here  we  report  results  of  a  Monte  Carlo  study  comparing  the  three  methods  of 
forming  bands,  and  of  Monte  Carlo  studies  comparing  the  three  estimators  of  £p(.r) 
which  were  introduced  in  Section  1. 

Table  1  gives  a  summary  of  results  of  a  Monte  Carlo  study  of  simulated-process 
bands  with  10,000  simulations,  and  of  another  Monte  Carlo  study  of  bootstrap  bands 
with  5.000  simulations.  Results  for  a  single  simulation  situation  are  given  here.  Re¬ 
sults  for  other  situations  were  similar  to  these.  In  the  situation  reported  on  here, 
there  is  one  covariate  x  which  is  evenly  spaced  on  the  interval  [0. 1  ].  The  lifetime  dis- 
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tribution  is  Exponential  1),  and  the  Cox  regression  parameter  30  is  I.  The  censoring 
distribution  is  also  exponential,  with  mean  set  to  give  the  desired  degree  of  censoring. 
In  these  studies,  n  =  SO  and  the  mean  of  the  censoring  variable  is  2.49.  which  gives 
20%  censoring.  The  number  of  bootstrap  samples  to  estimate  the  standard  deviation 
function  of  the  process  is  200;  the  number  of  bootstrap  samples  to  obtain  the  critical 
constants  is  599. 

In  this  situation,  the  bootstrap  bands  have  somewhat  higher  coverage  proba¬ 
bility  but  are  much  wider  than  the  simulated-process  bands.  The  log-transformed 
simulated-process  bands  have  slightly  lower  coverage  probability  and  are  slightly 
wider  than  the  standard  simulated-process  bands.  As  can  be  seen  from  the  table, 
the  simulated-process  equal-precision  bands  did  very  well.  The  mean  widths  appear 
large  for  all  the  methods,  compared  to  the  mean  survival  time,  which  ranges  from  1.0 
to  2.7  for  x  G  [0,1].  However,  these  means  include  the  widths  at  the  left  and  right 
extremes  of  the  covariate  values,  where  the  bands  can  be  very  wide,  and  therefore 
it  is  helpful  to  look  at  the  mean  critical  constants,  which  for  equal-precision  bands, 
show  how  much  precision  is  lost  by  going  from  confidence  interval  to  confidence  band. 
For  the  simulated-process  band,  the  mean  critical  constants  of  2.60  and  2.32  compare 
favorably  to  1.96  and  1.65  for  95%  and  90%  confidence  intervals. 

To  better  understand  the  relative  merits  of  the  two  methods,  we  felt  that  it  would 
be  useful  to  see  how  they  do  on  the  more  basic  problem  of  forming  confidence  intervals 
for  £p(x).  The  simulated-process  95%  confidence  interval  for  £p(x)  is  just  £p(x)  ± 
c.02sd(p,  x)jy/n  where  c.025  is  the  .975  quantile  of  the  standard  normal  distribution. 
Two  types  of  bootstrap  confidence  intervals  for  £p(x)  were  formed.  The  first  kind  was 
suggested  by  the  bootstrap  confidence  band  procedure,  and  we  studied  it  partly  as 
further  investigation  into  performance  of  the  bootstrap  band  procedure.  First  form 
many  bootstrap  samples,  and  get  values  £*(x)  —  £p(x)  from  each  bootstrap  sample. 
Take  the  .025  and  .975  quantiles  of  this  bootstrap  distribution  of  £‘(x)  —  £p(x). 
ci  and  Cu-  The  95%  confidence  interval  for  £p(x)  is  then  (<fp(x)  —  c^. sP(x)  ~  CL )■ 
(Remark:  In  forming  the  bootstrap  confidence  bands,  the  standard  error  function 
of  the  process,  a(p,  x),  is  estimated  before  carrying  out  the  bootstrap  procedure 
for  getting  critical  constants,  and  so  the  standard  error  at  a  single  x  is  constant 
throughout  this  second  stage  of  bootstrapping.  Thus  consideration  of  the  bootstrap 
distribution  of  (£'(x)  —  £p(x))/ct(p, x)  yields  the  same  intervals  as  just  described.) 
The  second  kind  of  bootstrap  interval  was  just  the  most  basic  type,  the  standard 
percentile  interval.  For  eleven  values  of  the  covariate  x,  evenly  spaced  on  the  interval 
[0,1],  the  simulated-process  intervals  and  bootstrap  percentile  intervals  had  coverage 
probabilities  very  close  to  95%.  However,  the  coverage  probabilities  of  the  first  kind 
of  bootstrap  interval  ranged  from  85%  to  S7%  over  the  11  x-values.  Computation 
of  average  critical  constants  ci  and  cu  for  this  type  of  bootstrap  interval  showed 
that  the  bootstrap  distribution  -  «*)  was  skewed  right.  Somehow,  in  spite 

of  the  low  coverage  probability  of  the  bootstrap  individual  confidence  intervals,  the 
features  of  the  skewed  bootstrap  distribution  and  the  taking  the  sup  over  a  range  of 
x's  combine  to  give  conservative  bootstrap  bands. 
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The  main  conclusion  of  these  studies  is  that  the  simulated-process  equal-precision 
bands  perform  surprisingly  well,  especially  considering  the  moderate  sample  size  used 
here.  It  was  not  clear  to  us  before  the  Monte  Carlo  studies  that  simulation  of  the 
estimated  process  would  be  a  feasible  approach  to  obtaining  critical  constants  for 
confidence  bands  in  general;  in  addition,  the  Cox  model  is  quite  complicated,  and  we 
didn't  know  how  estimation  of  the  hazard  rate  would  affect  the  performance  of  the 
bands. 

In  other  Monte  Carlo  studies  not  reported  here,  in  which  the  bin  widths  for 
the  kernel  estimator  of  hazard  rate  were  deliberately  chosen  to  be  not  good,  the 
simulated-process  method  often  had  low  coverage  probability.  The  log  transformed 
bands  were  then  a  noticeable  improvement,  and  of  course,  the  bootstrap  bands  were 
not  affected.  They  have  been  conservative  in  all  our  studies. 

Table  2  gives  the  results  of  several  Monte  Carlo  studies  comparing  the  three 
estimators  of  median  survival  time.  Continuous  versions  of  the  estimators  are  used, 
and  for  these  studies  the  aim  is  to  eliminate  the  choice  of  smoothing  iiicthod  as  a 
factor  in  the  comparative  performance  of  the  estimators.  B reslow "s  piecewise  linear 
smooth  of  the  cumulative  hazard  function  is  used  for  the  first  estimator,  ££(x),  and  to 
correspond  to  it,  piecewise  exponential  smoothing  of  the  product  integral  estimator 
of  the  survival  function  is  used  for  the  second  and  third  estimators,  f£(x)  and  ££(x). 
The  product  integral  estimator  of  the  survival  function  can  be  negative.  When  this 
occurred  in  these  studies,  the  estimator  of  the  median  of  the  survival  distribution 
was  found  by  linear  interpolation  between  the  largest  death  time  for  which  5  >  .5 
and  0. 

The  six  factors  determining  the  simulation  situation  are  the  survival  distribution, 
type  of  censoring  distribution,  percent  censoring,  covariate  distribution,  sample  size, 
and  value  of  the  Cox  regression  parameter  (5q.  In  these  studies,  the  censoring  distri¬ 
bution  is  always  exponential,  and  the  parameter  30  is  1.  There  Eire  two  levels  of  each 
of  the  other  four  factors.  The  two  choices  of  survival  distribution  are  Exponential  1) 
and  Weibull  with  shape  parameter  2.0  and  scale  parameter  1.35.  The  two  choices  of 
percent  censoring  are  10%  and  20%.  The  x's  are  fixed  in  the  studies.  For  one  level  of 
the  covariate  factor,  the  covariate  is  taken  evenly  spaced  on  the  interval  (0,1);  for  the 
other  level,  the  interval  is  (-1.1).  The  sample  sizes  are  20  and  50.  The  table  reports 
the  mean  integrated  squared  error.  By  ‘Integrated  squared  error"  is  meant  average 
squared  error  over  the  grid  of  x  values  considered.  The  number  of  simulations  in 
these  studies  is  10.000. 

The  overall  conclusions  are  that  the  two  product  integral  estimators  £~(x)  and 
fp(x)  outperform  the  original  estimator  £'(x)  in  terms  of  mean  integrated  squared 
error,  and  that  there  is  not  much  difference  between  ££(x)  and  £p(x),  although  on  the 
whole.  fp(x)  does  somewhat  better.  In  some  rows  of  Table  2,  ££(x)  has  much  smaller 
mean  integrated  squared  error  than  either  ££(x)  or  f£(x),  see  especially  rows  3.  7.  and 
15.  In  each  of  these  simulations,  the  covariate  x  ranges  from  —1  to  1;  there  is  less 
information  on  median  survival  time  for  the  small  covariate  values,  since  they  tend 
to  produce  large  survival  times.  Also,  these  simulations  had  a  small  sample  size  of 
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20.  In  this  situation,  there  is  a  norxnegiigible  probability  that  for  some  small  values 
of  x,  the  estimator  will  have  to  be  defined  by  extrapolation  beyond  the  last  observed 
death  time,  with  potential  for  large  outliers.  Therefore,  for  these  three  rows,  the  5% 
trimmed  means  of  the  integrated  squared  errors  are  also  reported  in  the  table.  The 
product  integral  estimator  ££(x)  still  shows  substantial  improvement  over  the  other 
two  estimators.  This  apparent  improvement  of  ££(x)  over  the  other  two  estimators 
here  may  be  due  merely  to  the  particular  choice  of  smoothing  algorithm  which  was 
used;  therefore,  we  do  not  claim  that  it  provides  as  large  an  improvement  as  the 
tabled  numbers  indicate.  The  empirical  evidence  from  these  studies  gives  support  for 
the  product  integral  method  beyond  the  theoretical  arguments  for  it. 

5  Illustration  of  the  Bands  on  Real  Data 

The  Stanford  Heart  Transplant  Data  have  appeared  frequently  in  the  statistics  lit¬ 
erature.  For  a  thorough  analysis  and  discussion,  see  the  discussion  paper  of  Aitkin, 
Laird,  and  Francis  (1983).  Here  we  do  not  attempt  to  add  any  insights  on  this  data 
set,  but  only  to  illustrate  the  methods  of  forming  bands  on  a  well-known  case.  We 
use  the  19S0  version  of  the  data,  as  given  in  Miller  and  Halpern  (19S2).  We  also 
follow  the  Miller  and  Halpern  Cox  model  analysis  of  the  data,  in  which  the  final 
model  was  a  quadratic  regression  of  log1Q  survival  time  on  the  covariate  age  for  the 
152  patients  with  complete  records  who  had  survived  at  least  10  days.  Figure  1  shows 
the  three  confidence  bands  for  the  median  log  survival  time.  The  kernel  estimate  of 
the  hazard  rate,  needed  by  the  two  types  of  simulated-process  bands,  used  constant 
bin  width  of  .27,  and  is  pictured  in  Figure  2.  The  bootstrap  bands  are  almost  always 
wider  than  the  simulated-process  bands  over  the  entire  range  of  the  covariate  age, 
except  at  the  far  left,  where  both  types  of  bands  are  so  wide  that  the  clear  message 
is  that  there  is  no  information  on  survival  time  distribution  for  these  very  young 
ages.  The  bootstrap  band  is  wider  than  the  simulated-process  bands  in  the  right 
half  of  the  graph,  but  it  is  more  appealing  than  the  other  two  bands  here  because 
it  is  smoother.  Its  wideness  goes  along  with  its  performance  in  the  Monte  Carlo 
studies.  The  unevenness  of  the  simulated-process  bands  in  the  right  end  of  the  graph 
corresponds  to  bumps  in  the  hazard  rate  estimator.  The  log-transformed  bands  are 
noticeably  different  from  the  untransformed  equal-precision  bands  at  the  left  end  of 
the  range  of  the  covariate;  otherwise  these  two  bands  are  similar.  To  give  an  idea  of 
the  width  of  the  bands,  upper  and  lower  endpoints  of  the  95%  bands  and  individual 
confidence  intervals  are  given  in  Table  3  at  two  values  of  age,  3S.5  and  4S.7  years. 
Considering  the  simulated-process  equal-precision  results  only,  we  see  that  although 
the  individual  intervals  barely  overlap,  the  bands  at  the  two  points  overlap  consid¬ 
erably,  so  that  one  can't  infer  from  the  band  a  definite  difference  in  median  survival 
time  for  the  two  ages.  Finally,  we  note  that  since  the  band  has  probability  .95  of 
containing  the  true  median  survival  time  simultaneously  for  all  x,  it  allows  one  to 
‘•snoop"  through  all  the  x  values  looking  for  interesting  significant  differences. 
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Appendix:  Proofs 

Theorems  1,  2,  3,  and  A.l  involve  the  space  C,+i(J?).  We  would  like  to  prove  that 
weak  convergence  takes  place  in  the  space  .D9+i([0.p^]  x  I\)  of  functions  defined  on 
(0,p*J']  x  K  which  are  “continuous  from  above,  with  limits  from  below”  with  the 
“Skorohod  topology”;  see  e.g.  Neuhaus  (1971).  However,  none  of  the  processes  £.(•), 
£.(•),  axid  £.(•)  need  be  in  this  space.  To  see  this,  let’s  look  at  the  case  of  p  fixed  and 
of  scalar  x.  Consider  for  example  £.(■)•  Recall  that 

This  is  the  composition  of  S~l  (which  is  right  continuous  with  left  limits  by  construc¬ 
tion)  with  a  decreasing  function  of  x.  The  result  is  a  left  continuous  function  with 
right  limits,  and  is  not  in  D(I\). 

So  we  are  forced  to  redefine  our  processes  to  make  them  continuous;  we  may  then 
just  as  well  work  with  the  simpler  spaces  and  C. 


Proof  of  Theorem  1 


Our  proof  is  based  on  a  general  result  concerning  weak  Bahadur  representations  for 
quantile  processes  (Proposition  A.l  below)  and  on  AG’s  weak  convergence  result  for 
\/n(A(-)  —  A(-),  /3  —  fi0).  We  first  deal  with  £p(x)  and  toward  the  end  of  the  proof  we 
switch  to  £p(x).  Let 


F(t\x)  =  1  -  n(l  -  A (dn))exp^'0, 

u<t 

and 

f(o=i-n(i-AW), 

u<t 


F(t\x)  =  1  -  f[(l  -  A(du))exp(0or) 

u<t 

(A.l) 

F(t) = i  -  nu  -  am) 


In  this  notation,  we  have 

v%(*)  -  (i  -  (i  -  f-'( i  -  (i  - 

Write  the  identity 


\Zn(iP(x)  ~  £P0))  =  Un (p,  x)  +  Tn{p,  x),  (A. 2) 


where 

X)  =  ■&(?-'  (l  -  (1  -  -  F-'  (l  -  (1  - 

and  (A. 3) 

rn(p,x)  =  \/n(^F~l  (l  -  (1  -  p)'*P<-*'*>)  -  F-1  (l  -  (1  -  p)«p<-<%*)^ . 


IS 


Throughout  this  proof,  (p,  x)  ranges  over  [O.pW]  x  A".  Theorems  3.2  and  3.4  of  AG 
state  that  in  D[0,  r  -f  e]  x  'R.q ,  asn->oo 

Vn(A(i)  -  -  |3„)  -i.  (H'(a(0)  -  &(i)'S-,/JZ,E-,'JZ).  (A.4) 

where  a(-),fe(-),  and  D  are  defined  by  (2.9)  and  (2.S),  and  D[0,r  +  e]  is  the  standard 
Skorohod  space. 

Consider  first  LTn(p,  x).  By  (A.4)  we  have  (via  (A.l)) 


vn(F(0  -  F(t),  -  3o)  -±->  (5(0(W(fl(0)  -  KtyZ-'/2Z),Z-l/2Z)  )  (A. 5) 


This  follows  from  the  easy  fact  that 


sup  \/n  (S(t)  —  S(i))  —  y/n  ^exp(— A(t))  —  exp(— A(t)))  0  (A. 6) 


0<<<T+e 


(S  and  5  are  the  survival  functions  corresponding  to  A  and  A.  respectively),  or 
somewhat  more  elegantly,  by  the  compact  differentiability  of  the  product  integral 
(Gill  and  Johansen  (19S7)):  (A. 5)  follows  from  (A.4)  by  the  delta  method.  To  deaf 
with  Un(p,x)  we  will  combine  (A. 5)  with  the  following  result. 

Proposition  A.l  (Doss  and  Gill,  1989)  Lei  £  be  a  function  defined  on  [0, 1]  that 
has  a  derivative  ('  which  is  positive  and  continuous.  Let  („,n  =  1,2,...  be  nonde¬ 
creasing  right  continuous  processes  on  [0,1]  satisfying  Cn(0)  =  ((0)  a. s.  and 

V^(Cn  ~  C)  K  in  .D[0, 1],  where  the  process  I\  has  a.s.  continuous  sample  paths. 
Then,  for  every  e  >  0 


sup  VnCCJCO-C  *(*))  + 


C(0)<«<c(i)-< 


(n ( C  1  (0)  —  t  \  I  pr 


C(C-HO)  /I 


0  (A.7) 


In  particular, 


in  Z)[ C(0),«l)-ej. 


We  now  combine  (A.7)  of  the  proposition  and  (A. 5),  and  obtain  that 


y/n(*?(x)  ~  &>(*))  = 


_ Qn(p,x) _ 

F'(f-'(i  -(1 


+  R{r!)(p,x)  AT„(p,i)  (A.S) 


where 


Qn(p,i)  =  -v^[F('F-1(l-(l-Pr 


exp(-J'x) 


-(i-(i-prT'-J'-->))  (A-9) 


sup  R\f\p.x)  -F-+  0 


(A. 10) 


19 


Let 


Qn(?,x)  =  -  (1  -(i  _p)«p(-^) 


We  now  apply  the  mean  value  theorem  to  the  term  Tn(p,x)  in  (A.S)  and  obtain 


\/n(£p(>)  -  £p(x))  = 


Qn(p,x) 


F'(f-1(i  -(1  -j>)«pM'* 


+  R{r!](p,x)  + 


(A. 12) 


xtrexp  {-3'0x)\'  r.th  ^  (2W  Qn(p,x)  —  Qn(p,  x 

- —  —  -  vMd  -  ,/50)  +  >(p,  x)  +  —7 - ; - 


MZp(x)) 


F'yF-1  (l  -  (1  _p)e*p(-4'*)) 


where 


sup  i?J^(p,  x)  0. 


Next,  we  proceed  to  show  that 


Qn(PiX)  -  Qn(p,x) 

F' (f-1  (l  -  (1  -  p)**^'*) 


(A. 13) 


(A. 14) 


Using  the  Skorohod  construction  we  may  assume  without  loss  of  generality  that  in 
(A. 5)  the  convergence  is  almost  sure  in  sup  norm  (since  W  is  continuous).  The 
continuity  of  W(a(-))  implies  that  if  {p„}  is  a  sequence  such  that  p„  — ►  0  a.s.,  then 
sup0<(<T+{  |W(a(t  -f  Tin))  —  W(a(t))|  — *  0  a.s.,  and  this  gives  the  convergence  in  prob¬ 
ability  in  (A. 14). 

Combining  (A. 12)  with  (A. 10),  (A. 13),  and  (A. 14)  we  obtain 


v^(£p(z)  -  £?(*))  = 


Qn(Pi  x)  (  X7T  exp(  — /?qI)  V 


VZ0-3o)-rR?(p.x)  (A.15) 


where 


sup  f?^3)(p,x)  TL,  o. 


(A. 16) 


(In  the  denominator  of  the  first  term  on  the  right  side  of  (A. 12),  the  change  resulting 
from  substituting  do  for  d  is  absorbed  into  R^(p,  x)). 

This  shows  that  the  finite  dimensional  distributions  of  \/n(£ p(x)  — £p(x))  converge 
to  those  of 

/Tr(q(£p(x)))  -  (6(gp(j))}  S~1/2Z\  ^  (riexp(-dox))  v-]. 


and  since  —TV*  —  TV',  this  means  that 


MW*)) 


the  finite  dimensional  distributions  of  v/n( £p(x  )  ~  sp(x)) 
converge  to  those  of  V(p,  x). 


(A. 17) 
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We  now  consider  tightness.  Let  u\.(-)  denote  the  continuity  modulus  of  the  func¬ 
tion  r.  The  weak  convergence  in  D  to  a  continuous  process  of  \/n(5  —  S)  implied  by 
(A. 5)  gives  the  following. 

For  every  e  >  0  limlimsupP  4 5_s)C^)  >  e}  —  0- 

6 — -0  n— *oo  v  v  J 

Therefore,  if  iZ'.(-)  denotes  the  continuity  modulus  of  a  function  defined  on  the 
(q  +  l)-dimensional  cube,  (A. 15)  and  (A. 16)  give 

for  every  e  >  0  limlimsup P  {tl.^p(r)_^(l))((5)  >  e)  =  0.  (A.1S) 

We  now  switch  to  ££(x),  which  is  what  the  theorem  actually  refers  to.  Note  that 
suPp,r  Vn(£p(x)  ~  ip(x))  0  (by  (A. 7)  of  Proposition  A.l  for  example),  and  so 
(A. 17)  is  still  true  if  we  replace  £p(x)  by  £Tx).  Moreover,  (A. 18)  implies  that 

for  every  e  >  0  lim limsupP  {*"/«({«(*)_<,(,))(*)  >  e}  =  0.  (A. 19) 

Now,  convergence  of  finite-dimensional  distributions  together  with  (A. 19)  are  enough 
to  give  the  desired  weak  convergence. 

Note:  We  have  made  use  of  the  fact  that  the  characterization  of  weak  convergence  in 
Cq+ 1  is  the  same  as  the  characterization  of  weak  convergence  in  C  given  in  Billingsley 
(1968,  pp.  54-55).  This  is  because  the  characterization  of  the  compact  sets  given  by 
the  Arzela-Ascoli  theorem  is  the  same  for  the  two  spaces  C  and  C9+1. 

Proof  of  Lemma  2.1 

Parts  (ii)  and  (iii)  are  Corollaries  3.5  and  3.3,  respectively,  of  AG.  The  proof  of  Part 
(i)  is  very  similar  (cf.  the  second  part  of  the  proof  of  Theorem  3.2  of  AG).  Part  (iv) 
follows  directly  from  Theorem  1.  Part  (v)  follows  from  the  results  of  Ramlau-Hansen 
(19S3);  see  in  particular  his  Theorem  4.1.2  and  also  lines  8-5  from  the  bottom  on 
page  460.  (Actually,  Ramlau-Hansen  obtains  uniform  consistency  of  kernel  estimates 
for  Aalen's  multiplicative  intensity  model.  However,  the  extension  to  the  Cox  model 
does  not  present  a  serious  difficulty.)  Part  (vi)  follows  from  Parts  (i)-(v).  The  proof 
of  Part  (vii)  is  straightforward. 

Proof  of  Theorem  2 

Theorem  2  follows  directly  from  Theorem  1  and  Part  (vii)  of  Lemma  2.1. 

Proof  of  Theorem  3 

Let  p'°)  be  fixed.  W  e  will  first  show  that  as  n  — +  oo 

L(p.x)  L(p.x)  in  C([p^.p^j  x  K).  (A. 20) 
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For  /  6  C([D(0),p(1)]  x  A‘)  we  shall  denote  suppe^(o)iPo)].xeA- |/(p,  x)|  by  ii/||.  To  obtain 
(A. 20)  assume  without  loss  of  generality  that  A  and  A  are  defined  on  a  common 
probability  space.  More  precisely,  let  (R,  P.  P )  be  the  probability  space  on  which  the 
random  vectors  (l',A’,,C,)  i  =  are  defined,  let  (f£.  T\  P')  be  a  probability 

space  on  which  TT(-)  and  Z  are  defined,  and  consider  the  product  space 
(fi  x  f£,P  x  T' ,  P  x  P').  On  this  probability  space  we  may  define  (versions  of)  L  and 
A  by  (2.13)  and  (2.15),  respectively.  (The  same  W(-)  and  Z  are  used  in  the  definition 
of  L  and  L .)  Note  that  \\L  —  L\\  — £  0.  This  follows  from  the  continuity  of  W(-)  and 
Lemma  2.1.  This  gives  (A. 20)  and  therefore  that 

11*11 (A. 21) 
Letting  Ln(p,x)  =  y/n(fp(x)  —  fp(x))/d(p,  x),  Theorem  2  implies  that 

Pnll  —  ||I||  (A- 22) 

By  Theorem  1  of  Tsirel'son  (1975)  the  distribution  of  ||A||  (and  also  that  of  |jA||)  is 
continuous.  Therefore,  by  (A. 21)  and  (A. 22) 


sup 

-oo<f  <oo 


p{|!Xn||<f}-P{||L|!<t} 


0 


(A. 23) 


Recalling  that  is  a  (1  —  a)th  quantile  of  |jA||,  we  see  that  (A. 23)  implies  that 
P { 1 1  An 1 1  <  s^}  — ►  1  —  a,  and  this  proves  the  theorem. 

Note:  is  approximated  through  a  simulation.  However,  it  is  easy  to  see  that  this 

does  not  cause  any  problem  in  the  theory. 


Proof  of  Theorem  4 

To  prove  Theorem  4  we  shall  first  prove  Theorem  A.l,  which  states  that  as  n  —  oc. 
the  distribution  of  y/n(£’(x)  —  <fp(x))  converges  to  the  limiting  distribution  of 
>/n(£p(x)  —  £p(x)).  We  express  this  notion  of  “weak  convergence  in  probability7-  in 
terms  of  the  Prohorov  metric  d  on  the  space  of  all  probability  measures  on  C9+1. 
This  metric  is  defined  in  Billingsley  (196S.  pp.  236-23S).  The  feature  of  it  that  we 
will  use  is  that  it  metrizes  weak  convergence:  If  pn  and  p  are  probability  measures 
on  Cf+lf  then  d(^n,p)  -»  0  pn-^p. 

We  will  use  the  following  notation.  If  £(p,  x)  is  a  process  in  C,+1 ,  then  £(((p,x)) 
and  £(((p,  x)|data)  denote  the  distribution  of  C(p,  x)  and  the  conditional  distribution 
of  C(p,x)  given  the  data,  respectively.  Here,  of  course,  the  data  is  (7i,  A’,),  i  = 


Theorem  A.l  Assume  the  conditions  of  Theorem  1.  Then 

d(^£(x/ri{CP~{x)  -  <fp(x))|data).  £(V(p.i))^)  —  0  (A. 24) 

where  V(p,x)  is  the  process  defined  m  Theorem  1. 
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The  proof  of  Theorem  A.l  is  similar  to  that  of  Theorem  i.  The  main  ch arises  are 
the  following. 

1  We  use  Hjort's  (19S5a)  result  on  the  consistency  of  the  bootstrap  in  the  Cox  model, 
instead  of  AG's  weak  convergence  result  (A. 4). 

2  We  use  a  version  of  Proposition  A.l  appropriate  for  the  bootstrap. 

3  We  use  elementary  bounds  on  the  size  of  the  largest  jump  of  the  bootstrap  version 
of  S. 

Instead  of  indicating  where  changes  need  to  be  made,  we  give  the  entire  proof  for  the 
sake  of  completeness. 

Recall  that  A*  and  3 "  are  the  same  as  A  and  3  except  that  they  are  calculated 
from  the  bootstrap  sample.  Define 

F’ftji)  =  1  —  n(l  —  A m(du))exp^  and  Fm(t)  =  1  —  J"J(1  —  A m(du))  (A. 25) 

u<t  l :<t 

We  have 

vn({»  -  iP(x>)  =  v/n(r*-!(l  -  C  -  -  F~1  (l  -  (1  -  pC xp‘-“  r))). 

As  in  the  proof  of  Theorem  1,  we  will  use  the  decomposition 

\/n(sp(~)  -  f„(x))  =  U’ip^x)  -f-  T‘(p.  x),  (A. 26) 

where 

C‘{p.x)  =  x/n(F-l(  1  -  (1  -  F~l  (l  -  (1  _p)« 

and  (A. 27) 

r;fp.i)  =  vn(F-:(i  -  n  -  P)expi-d"x))  -f~'( i  -  ( i  -  p,exp(-''r,)V 

The  proposition  of  Section  3  of  Hjort  ( 1 9S5a )  states  that  under  the  assumption  that 
'he  triples  (T,.  <5,.  A",),  :  —  1.2.  . .  .  are  iid. 

C  f\/n(  .i'  —  ^)|dataj  — D.  £(  ‘Z  )  a.s.  A.2S 

Note  that  the  quantity  to  th.  left  of  the  arrow  in  { A.2S)  is  a  function  of  the  data. 

and  the  expression  "a.s."  refers  to  the  sequence  ( 2~t .  S, .  A", ).  i  =  1.2 .  Define  IP* 

and  Z*  by 

T  V'(i)  =  Wfait))  -  b(t)W-U2Z  and  Z'  =  Z'i/2Z.  <  A.  29) 

A  careful  look  at  Hjort's  proof  reveals  the  following. 

1  Statement  (A.2S)  can  be  strengthened  to 

C( \/n(  A*  —  A .3’  —  hjjdata)  -A—  £(IT ’*.Z')  a.s. 

1  cf.  lines  10-12  from  the  bottom  on  page  5  of  Hjort.  19S5a).  which  is  equivalent  to 

d\ c(\/r>{  A‘  —  A,  3"  —  b'jjdata).  £  flPT.  Z')')  - *0  a.s..  !  A. 30) 

where  a  is  the  Prohorov  metric  on  the  space  of  all  probability  measures  on 
C'O.  r  -  ej  x  727+I. 
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-  If  the  iid  assumption  is  removed,  'hen  (A. 30)  is  weakened  to 

c^£(v/n( A"  -  A  Jm  -  0.  (A.31, 

which  is  obtained  by  working  with  subsequences. 

Let  |n;. }  be  any  subsequence  of  { n } .  To  show  (A. 24)  it  suffices  to  show  that  {n;;} 
has  a  further  subsequence  along  which  the  convergence  statement  (A. 24)  holds  a.s. 
By  AG’s  weak  convergence  result  (A. 4)  and  by  Iljort's  result  (A.31),  there  exists  a 
subsequence  of  {n-K}  along  which,  with  probability  one, 

!  |  A  —  A 1 1 — *0  3 — •  3  (A. 32) 

and 

d(c  (y/Z(Am  -  A.  S'  -  J)|data)..c(U"t.Zt))  — ♦  0  (A. 33) 

( =  sup  I  f\,  where  the  set  over  which  the  sup  is  taken  is  obtained  from  context), 
m  order  to  obtain  the  bootstrap  analogue  of  (A. 5),  we  need  the  following  lemma. 

e  use  the  notation  A/(t)  =  f(t)  —  f(t-)  for  any  right-continuous  function  with  left 
limits. 

Lemma  A.l  Under  the  assumptions  of  Theorem  4 

1  ||n3/<4 AA*jj  - *  0  in  bootstrap  probability,  a.s. 

2  |ln3^A5*M  — *  0  in  bootstrap  probability,  a.s. 

Lemma  A.l  follows  easily  from  Lemma  A. 2  below. 

Lemma  A. 2  Let  Ai,  A?, . . .  be  iid  from  a  continuous  distribution  F.  and  let  Fn  be 

the  empirical  distribution  function  of  A"1,....A'n.  Let  A'*,...,  A'*  be  an  iid  sample 

jrom  Lb,  and  let  Ff  be  the  empirical  distribution  function  of  X l . A'*.  Then 

nj/,Air‘2  - 0  in  bootstrap  probability,  a.s. 

Proof.  Let  e  >  0.  IVe  have 

P{l!?i3/’,AFn'!|  >  £}  <  nP{nuF‘(A’i)  >  n1/4e)  <  2n  exp( -n1/4e/3). 

the  iast  inequality  being  a  consequence  of  Bernstein's  inequality  (see.  e.g.  page  95  of 
Serfiir.g.  1GS0). 

Consider  now  S'  -  exp(  —  A"),  lie  have 
v  n(S'rt)  —  exp(  —  A*  ( / ) ))  =  Vn  \  11(1  -  a?A»)  -  exp( 

(i.<t 
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the  inequality  in  (A. 34)  following  from  LeCam  (1960).  By  Part  2  of  Lemma  A.l,  the 
last  expression  in  (A. 34)  converges  to  0  in  bootstrap  probability.  Combining,  we  see 
that  there  exists  a  set  Q0  of  probability  one,  and  a  subsequence  {n^ }  such  that 

d(c(y/Z(F-  -  FJ’  -d)|data),  £(SW1,Z')^  — >  0,  ||A-A||  — *  0. 

and  J  - *  Q  along  {n^}  for  all  data  points  in  Q0.  (A. 35) 

Theorem  2  of  Doss  and  Gill  (1989)  gives  a  version  of  Proposition  A.l  that  is 
appropriate  for  the  bootstrap.  Applied  to  our  situation,  it  implies  that  there  exists 
a  subset  fL  of  Do,  also  of  probability  one,  and  a  subsequence  }  (which  we  will 
denote  {m}),  such  that  along  {m},  for  each  data  point  in  Di 


F"(F~l)  -  F(F~l) 
F'{F~l) 


) 


0 


(A. 36) 


in  bootstrap  probability.  Therefore,  along  {m},  for  each  data  point  in  analogous 
to  ( A . S )  we  have 


Vn(i‘p(x)  -  £p(x))  =  Q’n(p,x )  -r  R"n{p..x)  +  Tn*(p,x)  (A^37) 

where 

0;(p.x)  =  -y/n(F'(F-l{  1  -  (1  -P)exp(-^)))  -  (l  -  (1  _p)e*P(-3'r)))_ 

and 

sup  i.R‘(p.  x)|  - +  0  in.  bootstrap  probability. 

p.z 

lo  deal  with  the  term  T^(p.x)  we  need  the  following  fact.  If  is  an  arbitrary 
sequence  of  positive  constants  tending  to  0  then 

sup  | %/n(F~l(t)  —  F~l(u))  —  \/n(F~l(t)  —  F-1(u))|  -T-+  o  (A.3S) 

l(-UI<Cn 

winch  is  proved  through  a  standard  Skorohod  construction  argument  using  the  conti¬ 
nuity  of  the  process  to  which  y/n(F~'  —  F~l )  converges.  Therefore,  there  exists  a  set 
of  probability  one  and  a  subsequence  (which  we  shall  also  call  {m}  and  D i,  respec¬ 
tively;  such  that  (A.3S)  holds  along  {m},  for  all  data  points  in  fij.  This  shows  that 
T‘  can  be  handled  in  the  same  way  that  Tn  was  handled,  and  we  conclude  that  along 
{m}.  for  all  data  points  in  filT  the  finite  dimensional  distributions  of  v/n(if’(x)  —  £p(x)) 
converge  to  those  of  V(p.x). 

For  the  rest  of  the  proof  we  continue  to  work  with  the  subsequence  {m}  and 
fixed  data  point  in  Q].  The  remaining  arguments  are  essentially  identical  to  those 
of  Theorem  1.  the  only  difference  being  in  proving  that  c{x)  -  £’(x))  — •  0  in 

bootstrap  probability  along  {m}  for  all  data  points  in  fij.  This  fact  is  a  consequence 
of  (A. 36)  and  of  Part  2  of  Lemma  A.l. 
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Proof  of  Theorem  4 

Theorem  1  of  Tsirel'son  (1975)  implies  that  the  distribution  of  -  )/cr(-.  -)||  is 

continuous.  Therefore. 


ii£(-) -a-) 


(A. 39) 


uniformly  in  t.  Working  with  uniformly  convergent  subsequences,  we  obtain  a  similar 
statement  for  the  bootstrap  analogue  of  (A. 39).  This  proves  the  theorem. 
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Table  3.  95%  bands  and  intervals  in  SHT 
data,  for  median  survival  time  in  years. 


Age  (Years)  Band  Interval 

35l>  SPl  (1.6,  9.7)  (2.4,  6.7) 

SP2  (1.7,  10.2)  (2.4,  6.9) 
B  (1.8,  9.1)  (2.3,  5.6) 

48.7  SPl  (.5,  4.8)  (.8,  2.8) 

SP2  (.5,  5.3)  (.8,  2.9) 
B  (.3,  6.4)  (.6.  2.2) 
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Figure  1.  95%  confidence  bands  for  median  survival  time  in  SHT  data 
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